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An anterior pathway, concerned with extracting meaning from 
sound, has been identified in nonhuman primates. An analogous 
pathway has been suggested in humans, but controversy exists 
concerning the degree of lateralization and the precise location 
where responses to intelligible speech emerge. We have demon- 
strated that the left anterior superior temporal sulcus (STS) re- 
sponds preferentially to intelligible speech (Scott SK, Blank CC, 
Rosen S, Wise RJS. 2000. Identification of a pathway for intelligible 
speech in the left temporal lobe. Brain. 123:2400-2406.). A func- 
tional magnetic resonance imaging study in Cerebral Cortex used 
equivalent stimuli and univariate and multivariate analyses to argue 
for the greater importance of bilateral posterior when compared 
with the left anterior STS in responding to intelligible speech 
(Okada K, Rong F, Venezia J, Matchin W, Hsieh IH, Saberi K, Ser- 
ences JTHickok G. 2010. Hierarchical organization of human audi- 
tory cortex: evidence from acoustic invariance in the response to 
intelligible speech. 20: 2486-2495.). Here, we also replicate our 
original study, demonstrating that the left anterior STS exhibits the 
strongest univariate response and, in decoding using the bilateral 
temporal cortex, contains the most informative voxels showing an 
increased response to intelligible speech. In contrast, in classifi- 
cations using local "searchlights" and a whole brain analysis, we 
find greater classification accuracy in posterior rather than anterior 
temporal regions. Thus, we show that the precise nature of the 
multivariate analysis used will emphasize different response pro- 
files associated with complex sound to speech processing. 

Keywords: fMRI, intelligibility, multivariate pattern analysis, speech 
perception, superior tennporal sulcus 



Introduction 

Studies in humans and nonhuman primates suggest that audi- 
tory information is processed hierarchically, with primary 
auditory cortex (PAC) responding in a relatively nonselective 
fashion, and lateral regions responding selectively to stimuli 
of greater complexity (Rauschecker 1998; Kaas and Hackett 
2000; Wessinger et al. 2001; Davis and Johnsrude 2003). In 
the monkey, anatomical and functional evidence supports the 
notion of an anterior "what" pathway that is sensitive to 
different conspecific communication calls (Rauschecker 1998; 
Tian et al. 2001). A similar pathway has been suggested in 
humans, but there is disagreement concerning the extent to 



which it is left lateralized, and as to whether intelligible 
speech is processed in predominantly anterior, posterior, or 
both temporal fields (Scott et al. 2000; Davis and Johnsrude 
2007; Hickok and Poeppel 2007; Rauschecker and Scott 2009; 
Peelle et al. 2010). 

Many functional imaging studies have attempted to isolate 
neural regions that are sensitive to intelligible speech com- 
pared with those regions that respond to acoustic complexity. 
The selection of a suitable baseline comparison condition has 
proved difficult due to the inherent acoustic complexity of the 
speech signal; low level auditory baselines such as tones and 
noise bursts make it difficult to distinguish between neural 
responses that are specific to speech, and those that are a con- 
sequence of the perception of a complex sound. Using 
rotated speech (Blesser 1972), which is well matched to 
speech in both spectral and amplitude variations, we have 
shown that the left anterior superior temporal sulcus (STS) re- 
sponds preferentially to intelligible speech (Scott et al. 2000; 
Narain et al. 2003). These previous studies employed univari- 
ate statistical analyses, which identified the regions in which 
there was a greater mean response to 2 different kinds of in- 
telligible speech [clear and noise-vocoded speech (Shannon 
et al. 1995)] when compared with 2 kinds of unintelligible 
sounds (rotated speech and rotated-noise-vocoded speech). 

Other researchers have suggested that the recognition of in- 
telligible speech arises in bilateral posterior STS (Okada and 
Hickok 2006; Hickok and Poeppel 2007; Hickok et al. 2009; 
Vaden et al. 2010). A recent study in Cerebral Cortex (Okada 
et al. 2010) replicated the Scott et al. (2000) methodology with 
functional magnetic resonance imaging (fMRI). The univariate 
analysis in the study showed widespread bilateral activation to 
the summation of clear and noise-vocoded speech relative to 
their unintelligible rotated equivalents. The authors then con- 
ducted a multivariate pattern analysis (Pereira et al. 2009) 
within small cube-shaped regions of interest (ROIs) at specific 
sites in the temporal cortex. This showed that the bilateral 
anterior and posterior STS (in addition to the right mid-STS) 
contained sufficient information to separate intelligible from 
unintelligible sounds. Two sets of classifications were per- 
formed, classifications in which the conditions differed in in- 
telligibility (e.g. clear vs. rotated speech and noise-vocoded 
speech vs. rotated-noise-vocoded speech) and those in which 
the conditions differed predominantly in spectral detail (clear 
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vs. noise-vocoded speech and rotated speech vs. rotated- 
noise-vocoded speech). The left posterior and right mid-STS 
showed the greatest classification accuracy in discriminations 
of intelligibility when they were expressed relative to the accu- 
racy in discriminations of spectral detail. The left anterior STS 
successfully classified the contrasts of intelligibility, as well as 
one of the contrasts that differed in spectral detail (clear vs. 
noise-vocoded speech). This was interpreted as showing that 
the left anterior STS was unlikely to be a key region involved 
in resolving intelligible speech owing to its additional sensi- 
tivity in discriminating "spectral detail". 

Here, we also replicate the Scott et al. (2000) study in fMRI 
using univariate general linear modeling and multivariate 
pattern analysis. We conduct additional univariate and multi- 
variate analyses that allow a more complete description of the 
role of the bilateral anterior and posterior temporal cortices, 
and regions beyond the temporal lobe, in responding to intel- 
ligible speech. 



Materials and Methods 

Participants 

Twelve right-lianded native English speakers with no known hearing 
or language impairments participated in the experiment (aged 18-38, 
mean age 25, 3 males). All participants gave informed consent. The 
experiment was performed with the approval of the local ethics com- 
mittee of the Hammersmith Hospital. 



Stimuli 

Stimuli were as described in Scott et al. (2000) and Narain et al. 
(2003). In brief, all stimuli were drawn from low-pass filtered (3 8 
kHz) digital representations of the Bamford-Kowal-Bench sentence 
corpus (Bench et al. 1979). There were 4 stimuli conditions: Natural 
speech (clear), noise-vocoded (NV), spectrally rotated (rot), and spec- 
trally rotated-noise-vocoded speech (rotNV). 

The rotation of speech is achieved by inverting the frequency spec- 
trum around 2 kHz using a simple modulation technique; this retains 
spectral and temporal complexity, but makes the speech unintelligible 
(Blesser 1972). It has been described previously as sounding like an 
alien speaking your language but with different articulators (Blesser 
1972). It contains some phonetic features, for example, the presence 
of voicing, but these features do not generally give rise to intelligible 
sounds without significant training. A preprocessing filter was used to 
give the rotated speech approximately the same long-term average 
spectrum as the original, unrotated speech. 

Noise-vocoding involves passing the speech signal through a filter 
bank (in this case 6 filters) to extract the time-varying envelopes 
associated with the energy in each spectral channel. Envelope detec- 
tion occurred at the output of each analysis filter by half-wave rectifi- 
cation and low-pass filtering at 320 Hz. The extracted envelopes were 
then multiplied by white noise and combined after refiltering 
(Shannon et al. 1995). This retains the amplitude envelope cues 
within specified spectral bands, but removes spectral detail. With 6 
bands, the speech can be understood with a small amount of training. 
It sounds like a harsh whisper with only a weak sense of pitch. Sub- 
jects underwent a short training, as described in Scott et al. (2000), 
to ensure that they understood the noise-vocoded speech. The combi- 
nation of vocoding and rotation sounds like intermittent noise with 
weak pitch changes. It does not contain phonetic content and is not 
intelligible or recognizable even as "alien" speech. 

The clear and noise-vocoded speech are both intelligible, but the 
rotated and rotated-noise-vocoded speech are not, while clear and 
rotated speech contain more detailed spectral information than noise- 
vocoded and rotated-noise-vocoded speech. 



Functional Neuroimaging 

Subjects were scanned on a Philips (Philips Medical Systems, Best, 
The Netherlands) Intera 3.0-T MRI scanner using Nova Dual gradi- 
ents, a phased-array head coil and sensitivity encoding (SENSE) with 
an underlying sampling factor of 2. Functional MRI images were ac- 
quired using a 7'2*-weighted gradient-echo planar imaging sequence, 
which covered the whole brain (repetition time: 10 s, acquisition 
time: 2 s, echo time (TE): 30 ms, flip angle: 90°). Thirty-two axial 
slices with a slice thickness of 3 25 mm and interslice gap of 0.75 mm 
were acquired in an ascending order (resolution: 2.19x2.19x4.00 
mm; field of view 280 x224 x128 mm). Quadratic shim gradients 
were used to correct for magnetic field inhomogeneities. images 
were acquired for all subjects (resolution = 1.20 x 0.93 x 0.93 mm). 
Participants listened to sounds within the scanner using an 
MR-compatible binaural headphone set (MR confon GmbH, Magde- 
burg, Germany). All the stimuli were presented using E-Prime soft- 
ware (Psychology Software Tools, Inc., Pittsburgh, PA, USA) installed 
on an MR interfacing integrated functional imaging system (Invivo 
Corporation, Orlando, EL, USA). 

Data were acquired using sparse acquisition, which ensured that 
the stimuli were presented in silence (Hall et al. 1999). Stimuli were 
presented during a 7.5-s MR silent period, which was followed by 
a 2-s image acquisition and a 0.5-s silence. Two runs of data were 
acquired, with each run consisting of 24 trials of each condition pre- 
sented in a pseudorandomized order (96 trials/volumes per run). 
A total of 192 trials/volumes were acquired for each subject. Each trial 
comprised three randomly selected unique sentences from one exper- 
imental condition with each sentence lasting <2 s in duration. Subjects 
listened passively to the sentences in the scanner and were instructed 
to try and understand each sentence. 

Data Analysis 

Univariate Analysis 

Data were analyzed using Statistical Parametric Mapping (SPM8; http 
://www.fil. ion.ucl.ac.uk/spm/, last accessed March 25, 2013). Scans 
were realigned, unwarped, and spatially normalized to 2 mm"^ isotropic 
voxels using the parameters derived from the segmentation of each par- 
ticipant's 7"i-weighted image, and smoothed with a Gaussian kernel of 
10-mm full-width at half maximum. A first-order finite impulse response 
(EIR) filter with a window length equal to the time taken to acquire a 
.single volume — a box car function — was used to model the hemody- 
namic response. The 4 stimulus conditions (and 6 movement regressors 
of no interest) were entered into a general linear model at the first level. 
A 2 X 2 repeated-measures analysis of variance (ANOVA) with the factors 
intelligibility (+/— ) and spectral detail (-1-/—) was conducted at the 
.second level using the con images generated from the first level. The 
pairwise .subtraction of all the conditions including the simple effects of 
(clear — rot) and (NV — rotNV) were examined using separate repeated- 
measures Mests. All statistical maps were thresholded at peak level 
0.001 (uncorrected) with a false discovery rate (FDR) correction of 
g<0.05 at the cluster level. Activations were localized using SPM 
anatomy (Eickhoff et al. 2005). EoUow-up analyses were conducted on 
functionally defined ROIs. Data were extracted from 7x7x7 voxel 
cubes (14 mm X 14 inm X 14 mm = 2744 mm') using the Marsbar 
toolbox (http://mar.sbar.sourceforge.net/) (Brett et al. 2002). The 
location of these ROIs was defined by constructing ROIs around the 
peaks of the main positive effect of intelligibility in the anterior and pos- 
terior temporal cortex. Note that this does not constitute "double 
dipping" as these ROIs were used to examine between region differ- 
ences — in a fuUy balanced design, any statistical bias will be equivalent 
between regions (see Kriegeskorte et al. (2009) supplementary materials 
and Eriston et al. (2006)). 

Multivariate Pattern Analysis 

Pattern analysis involves using an algorithm to learn a function that 
distinguishes between experimental conditions using the pattern of 
voxel activity. Typically, data are divided into training and test sets; 
following a training phase, the success of the function is evaluated by 
assessing its ability to correctly predict the experimental conditions 
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associated with previously unseen brain images. The underlying as- 
sumption is that if the function successfully predicts the experimental 
conditions from the previously withheld images at a level greater than 
chance, then there is information within those images concerning the 
conditions. Pattern analysis considers the pattern of activation across 
multiple voxels, and this allows weak information at each voxel to be 
accumulated such that voxels that do not carry information individu- 
ally can do so when jointly analyzed (Haynes and Rees 2006). Further- 
more, as classification does not necessarily require spatial smoothing, 
it can afford a very high spatial specificity. 

A linear support vector machine (SVM) is a discriminant function 
that attempts to fit a linear boundary separating data observed in differ- 
ent experimental conditions. When applied to fMRI analysis, an SVM 
attempts to fit a linear boundary that maximizes the distance between 
the most similar training examples from each condition within a multi- 
dimensional space with as many dimensions as voxels. These examples 
are referred to as the support vectors and are the training examples 
which are most difficult to separate. In the present study, they refer to 
the brain volumes where neural responses to intelligible and unintelligi- 
ble sounds are most closely matched. The separating boundary is the 
direction in the data of maximum discrimination. A weight vector lies 
orthogonal to this boundary and is the linear combination or weighted 
average of the support vectors. Every voxel receives a weight, with 
larger weights indicating voxels that contribute more to classification. 
Given a positive and a negative class (-1-1 = intelligible speech and — 1 = 
unintelligible sounds), a positive weight for a voxel means that the 
weighted average in the support vectors for that voxel was higher for 
listening to intelligible speech when compared with unintelligible 
sounds, and a negative weight means that the weighted average was 
lower for intelligible speech relative to unintelligible sounds (Mourao- 
Miranda et al. 2005). The classification prediction (whether an unseen 
example is classified as belonging to the intelligible or unintelligible 
class), is achieved by summing the activation values at each voxel multi- 
plied by their associated weight value, and adding a bias term. If the 
resulting value is greater than zero, an unseen example will be classified 
(in the case of this experiment) as an intelligible speech trial, and if that 
value is less than zero it will be classed as an unintelligible trial. As the 
classification solution derived from SVMs is based on the whole spatial 
pattern, local inferences about single voxels should be interpreted only 
within the context of their contribution to a wider discriminating 
pattern. 

Functional images were unwarped, realigned to the first acquired 
volume, and normalized, but not smoothed, using SPM8. Linear and 
quadratic trends were removed, and the data were ^-score trans- 
formed by run within voxel (to remove amplitude differences 
between runs) and by trial across voxels (to remove overall amplitude 
differences between individual trials). This removes differences in the 
overall amplitude of the signal between runs and ensures that classifi- 
cation is achieved based on differences in the pattern of voxel 
responses between the conditions, rather than on an overall increase 
in response to one condition over another in all the voxels within a 
searchlight or ROI. Training/test examples were constructed from 
single brain volumes. Data were separated into training and test sets 
by run to ensure that training data did not influence testing (Krieges- 
korte et al. 2009). The first classifier was trained on the first run and 
tested on the second, and vice versa for the second classifier. The 
"true" accuracy of classification was estimated by averaging the per- 
formance of the 2 classifiers for each subject. 

The activation patterns submitted to the classification analyses 
were defined using (1) a whole-brain searchlight approach in which 
classification was conducted at each and every voxel in turn using the 
surrounding local neighborhood of voxels (Kriegeskorte et al. 2006) 
and (2) using an anatomical mask in which classification was con- 
ducted using all voxels within the bilateral temporal cortex (including 
PAC), and an additional control region within the visual cortex. 

Searchlight Analyses 

Searchlight analyses were conducted on the whole-brain volume with 
each searchlight consisting of a cube of 7 x 7 x 7 voxels (14 mm x 14 
mm X 14 mm = 2744 mm'), using the searchmight toolbox (http 
://minerva.csbmb. princeton.edu/searchmight/, last accessed March 



25, 2013) (Pereira and Botvinick 2011) and a linear SVM with the 
margin equal to 1/the number of voxels in each searchlight (1/343). 
Classifications of each intelligibility contrast were conducted: (clear 
vs. rot) and (NV vs. rotNV). Classification maps were generated for 
each subject with the value at each voxel reflecting the classification 
accuracy of the surrounding local neighborhood (proportion correct). 
Each classification accuracy value was subtracted from 0.5 (chance 
level) to center the values on zero, and these images were then sub- 
mitted to a random effects one-sample f-test using SPM8. Classifi- 
cation maps were thresholded at a peak level P< 0.001 (uncorrected) 
with FDR correction at the cluster level of g < 0.05. 

Anatomical Mask Analyses 

Classifications of each intelligibility contrast were also conducted 
using an anatomical mask that included all the voxels in the bilateral 
temporal and auditory cortices and in a control region (the inferior 
occipital gyrus). This allowed us to understand how each voxel within 
the bilateral temporal lobes contributed to the classification of intelli- 
gible speech. The linear SVM from the Spider toolbox (http://www. 
kyb.tuebingen.mpg.de/bs/people/spider/, last accessed March 25, 
2013), with the Andre optimization and a hard margin, was used to 
train and validate models. A large anatomical ROI was constructed 
that consisted of bilateral PAC, defined using the maximum prob- 
ability maps from the SPM anatomy toolbox (Eickhoff et al. 2005), 
and the bilateral superior, middle, and inferior temporal gyri taken 
from the AAL ROI library. These temporal lobe ROIs had previously 
been defined by hand on a brain matched to the MNI/ICBM template 
using the definitions of Tzourio-Mazoyer et al. (2002) and are avail- 
able via the Marsbar toolbox. An additional control region, the 
inferior occipital gyrus, was used to validate the analysis approach — 
this was also derived from the AAL library. For each intelligibility 
classification, the classification accuracy and the weight vector were 
extracted. 



Results 

Univariate Analysis 

Okada et al. presented the whole-brain univariate analysis for 
the main positive effect of intelligibility, that is, the average of 
the response to clear and NV relative to the average of rot and 
rotNV. Here, we conduct a full factorial analysis to examine 
the neural response associated with the main positive effect 
of intelligibility and spectral detail, and their interaction. An 
idealized intelligibility response would be described by a 
region that responded equivalently and positively to the 2 in- 
telligible conditions (clear and NV), and equivalently (and 
negatively) to the 2 unintelligible conditions (rot and rotNV). 
An interaction between these factors might identify regions 
showing a differential response to the 2 simple intelligibility 
effects: (clear - rot) and (NV - rotNV). 

Main Positive Effect of Spectral Detail: 
(Clear + rot) - (NV+ rotNV) 

The main effect of spectral detail gave rise to bilateral clusters 
of activation focused within the PAC [12.3% in the left and 
15.1% of the cluster in the right were located in PAC region 
TE 1.0 and 1.1 (Morosan et al. 2001)] and along the length of 
the superior temporal gyrus (STG; Fig. 1). Peak level acti- 
vations were found in PAC and the STG bilaterally. This ro- 
bustly bilateral activation contrasts with Scott et al. (2000), 
which evidenced a solely right lateralized response to the 
equivalent contrast. The increased power of this study, in 
which there were many more repetitions of the stimuli and 
measurements of the neural response, is likely to explain this 
difference. 
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Main positive effect of spectral detail 




Figure 1. Surface renderings of the main positive effect of spectral detail, peak level 
uncorrected P< 0.001, FDR cluster corrected at qt < 0.05. Plots show the mean 
beta value with error bars representing the standard error of the mean corrected for 
repeated-measures comparisons. Note that since there was no implicit baseline, the 
plots are mean centered to the overall level of activation to all conditions. 

Main Positive Effect of Intelligibility: 
(clear + NV)- (rot + rotNV) 

The main effect of intelligibility was associated with clusters 
of activity that spread along the full length of the superior 
and middle temporal gyrus in the left, and the mid to anterior 
superior/middle temporal gyrus in the right hemisphere 
(Fig. 2A and Table 1). 0.6% of the clusters fell in TE 1.2 in the 
left hemisphere and 0.2% fell in TE 1.2 in the right hemi- 
sphere. Peak level activations were found in the bilateral STS, 
left fusiform gyrus, left parahippocampal gyrus, and the left 
hippocampus. The pattern of activation observed for this con- 
trast is very similar to that observed by Okada et al. Response 
plots are shown from peak activations in the left anterior and 
posterior STS and the right anterior STS (no peaks were 
found in the right posterior STS). Plots from the left posterior 
STS suggested that there was a larger relative positive differ- 
ence between NV and rotNV, when compared between clear 
versus rot (Fig. 2A, plot 1). Plots from bilateral anterior 
regions showed a seemingly more equivalent response 
between those contrasts (Fig. 2A, plots 2 and 3). 

Interaction (f-test) 

The interaction was associated with activation that spread pre- 
dominantly across the mid to posterior superior and middle 
temporal gyri bilaterally (Fig. 3) with peak level activations 
found in the bilateral STG and STS. We explored these inter- 
actions by examining the simple effects while inclusively 
masking for the interaction at the same threshold. This high- 
lighted 2 particularly interesting interaction patterns. One 
pattern was characterized by increased responses to rotated 
speech compared with all the other conditions — this was 



particularly true of the response in the right planum tempor- 
ale (e.g. Fig. 3, plot 3). The second pattern was characterized 
by an increased response to clear, rot, and NV relative to 
rotNV, in the absence of evidence of a difference in the 
response between clear, rot, and NV (see the response in the 
left posterior STS, Fig. 3, plot 2). 

Simple Intelligibility Effects and Their Conjunction 
Consistent with the identification of a significant interaction, 
the 2 simple intelligibility effects: (clear — rot) and (NV — 
rotNV) gave rise to very different statistical maps (Fig. 4A,B). 
The contrast of (clear — rot) solely activated the left anterior 
STS. This was in contrast to (NV — rotNV), which gave rise to 
broad activation extending along the length of the STG and 
STS, middle temporal gyri, and extending into the inferior 
temporal gyri bilaterally. 1.7% and 4.1% of the cluster ex- 
tended into TE 1.0 and 1.2 in the left hemisphere, respect- 
ively, and 9-2% and 8.4% into TE 1.0 and 1.2, respectively, in 
the right hemisphere. Peak level activations were found in the 
bilateral STG/STS. 

A conjunction analysis was carried out to isolate activations 
common to the 2 simple intelligibility effects. This statistical 
analysis reflects the original concept of the Scott et al. (2000) 
design, which used more than one intelligibility subtraction in 
an attempt to isolate a more acoustically invariant intelligibil- 
ity response. It has been noted that there has been confusion 
in the past concerning the interpretation of conjunction ana- 
lyses (Nichols et al. 2005). For the sake of clarity, there are 2 
commonly used conjunction analyses: The global null con- 
junction and the more recently introduced conjunction null 
analysis. Narain et al. (2003) conducted the global null con- 
junction of the 2 intelligibility contrasts. They found activation 
in the left anterior and posterior STS for this analysis. 
However, as they used the global null conjunction, it is only 
possible to draw the inference that there was an effect in one 
or more of the intelligibility contrasts, rather than necessarily 
in both of them. Here, unlike Narain et al., we conduct the 
conjunction null rather than the global null conjunction. Stat- 
istical maps of the conjunction null show voxels that survive 
the specified threshold across all the individual subtractions 
that make up the conjunction, this allows a stronger inference 
to be made that there is an intelligibility effect in all the intel- 
ligibility contrasts considered. The statistical map resulting 
from the conjunction null was identical to the simple effect of 
(clear — rot) — that is, the activation generated by (NV — rotNV) 
encompassed the activation of (clear — rot), but not vice versa 
(Table 1). Note that the conjunction null map is not shown in 
the figure, as it was identical to (clear — rot). Our results there- 
fore extend those of Narain et al. in showing that the left 
anterior STS was significantly activated by both individual in- 
telligibility contrasts at a corrected threshold. 

To directly compare the intelligibility response across regions, 
we extracted data around the peaks identified by the main effect 
of intelligibility. These were located in left anterior [—58 2 —16], 
right anterior [56 2 -18], and left posterior STS [-62 -34 0] 
(Fig. 5E). Note that we did not extract data from the right pos- 
terior STS (shown in gray) as activation did not spread to that 
region. A 3x2x2 repeated-measures ANOVA with factors: 
Location (left anterior, right anterior, and left posterior), intellig- 
ibility (_+/—}, and spectral detail (-1-/—) was conducted. This 
showed there to be main effects of intelligibility (i^i,ii = 116.648, 
/*< 0.001) reflecting an increased response to intelligible speech 
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A. Main Positive Effect of Intelligibility 




B. Intelligibility x Spectral Detail 

n.s. 
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D. Intelligibility x Location 
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Figure 2. Ifi) Surface renderings of tfie main positive effect of intelligibility, peak level uncorrected P < 0.001, FDR cluster corrected atq< 0.05. {B) Data averaged across the 
region for each condition, illustrating the intelligibility x spectral detail interaction. (C) Data showing (high spectral detail -low spectral detail conditions) by location, illustrating 
the spectral detail x location interaction. (D) Data showing (intelligible -unintelligible conditions) by location, illustrating the intelligibility x location interaction. All plots show the 
mean beta value and error bars representing the standard error of the mean corrected for repeated-measures comparisons. 



Table 1 

Peak level activations for the main positive effect of intelligibility and the conjunction null of the 
intelligibility contrasts at peak level P < 0.001 uncorrected, FDR cluster corrected atq < 0.05 



Location 


MNI 

X 


y 


z 


Extent 


Z 


Main effect of intelligibility 












Left anterior STS 


-58 


2 


-16 


2770 


7.16 


Left posterior STS 


-62 


-34 


0 




6.28 


Left posterior STS 


-52 


-50 


18 




4.49 


Rigtit anterior STS 


56 


2 


-18 


401 


5.43 


Right anterior STS 


52 


12 


-18 




5.22 


Right mid-STS 


50 


-22 


-6 


183 


4.09 


Left fusiform gyrus 


-24 


-30 


-24 


162 


3.73 


Left parahippocampal gyrus 


-20 


-24 


-20 




3.71 


Left hippocampus 


-24 


-12 


-18 




3.67 


Conjunction null: (clear - rot) n (NV 


- rotNV) 










Left anterior STS 


-58 


4 


-20 


256 


4.48 


Left anterior STS 


-60 


-2 


-12 




4.07 



MNI, Montreal Neurological Institute 

when compared with unintelligible sounds, and a main effect of 
spectral detail = 13.731, 0.003) reflecting an increased 
response to conditions containing spectral detail compared 
with those without. There was no main effect of location 



(^2,22 = 0.750, P= 0.484). The interactions of spectral detail x 
intelligibility (.Fi^n = 36.767, P< 0.001), location x spectral detail 
(^2,22 = 4.727, P= 0.020), and location x intelligibility (i='2,22 = 
10.372, P= 0.001) were significant. The 3-way interaction was 
not significant, showing that there was no evidence of a differ- 
ence in the interaction between intelligibility and spectral detail, 
as expressed across the regions (F2 22 = 2.284, 7^= 0.126). We de- 
composed the 2-way interactions by examining simple effect 
contrasts. In the case of the interaction between intelligibility 
and spectral detail, there was a significant difference between 
clear and rot (J(ii) = 4.592, P= 0.001), NV and rotPvTV 
(J(ii) = 10.818, P< 0.001), and rot and rotNV (J(m = 6.484, 
P< 0.001) in the absence of a difference between clear and NV 
(f(ii) = 0.591, P = 0.566; Fig. 2B). The interaction was driven by a 
relative deactivation to rotNV; in other words, there was no evi- 
dence that the intelligible conditions (NV and clear) differed 
from one another, but the unintelligible conditions (rot and 
rotNV) differed from one another dependent on the level of 
spectral detail. The interaction between location and spectral 
detail was driven by a larger response to conditions containing 
spectral detail, when compared with those without, in the left 
anterior STS when compared with the right anterior STS 
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Figure 3. Surface renderings of the interaction between spectral detail and 
intelligibility, peak level uncorrected P< 0.001, FDR cluster corrected at q < 0.05. 
Plots show the mean beta value with error bars representing the standard error of 
the mean corrected for repeated-measures comparisons. 



(f(m = 3.359, P= 0.006) and the difference between the left 
posterior when compared with the right anterior STS was mar- 
ginally significant (ffm = 2.f92, P= 0.051), in the absence of a 
difference between the left anterior and left posterior STS 
(f(ii) = 0.289, 0.778; Fig. 2(7). The interaction between 
location and intelligibility was driven by a larger response to in- 
telligible speech, when compared with unintelligible sounds, in 
the left anterior STS as contrasted with both the left posterior 
STS (f(ii) = 3.831, 0.003) and the right anterior STS 
(f(jjj = 5.118, /'< 0.001), in the absence of a difference between 
the left posterior and right anterior STS (f(ii) = 0.106, P= 0.918; 
Fig. 2D). This demonstrated that the left anterior STS exhibited 
the strongest univariate intelligibility effect. 



Multivariate Pattern Analysis 

Searchlight Analyses 

Okada et al. conducted pattern classifications within single 
small cubes of data in the left and right anterior and posterior 
STS (and the right mid-STS), defined in each subject by identi- 
fying local maxima within these regions in the univariate con- 
trast of (clear — rot). We extend their analysis by conducting a 
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Figure 4. Surface renderings of the simple effects of (/I) (clear -rot), 
[B] (NV- rotNV), peak level uncorrected P< 0.001, FDR cluster corrected at 
q < 0.05. Note that the conjunction null (not shown) is identical to the response to 
(clear -rot). Plot shows the mean beta value with error bars representing the 
standard error of the mean corrected for repeated-measures comparisons. 



searchlight analysis in which we examine local multivariate 
information across the whole-brain volume. This has the 
advantage of allowing us to more fully probe the response 
within the temporal lobes rather than just within a small 
number of selective sites, as well as allowing us to examine 
the multivariate response of frontal and parietal cortices, 
regions that have previously been shown to be associated 
with intelligibility responses (Davis and Johnsrude 2003; 
Obleser et al. 2007; Abrams et al. 2012). 

Random effects one-sample /-tests were conducted on 
searchlight accuracy maps generated from the group of sub- 
jects. In the case of classifications of clear versus rotated 
speech (Fig. 5A), above-chance classification was identified 
within a fronto-temporo-parietal network, including the bilat- 
eral inferior frontal gyri (pars opercularis and triangularis), 
left angular gyrus, and both anterior and posterior superior/ 
middle temporal gyri. Small clusters were also identified in the 
right middle frontal gyrus, supplementary motor area, and 
bilateral insulae as well as the left thalamus and caudate 
nucleus. In the left hemisphere, 7.1% of the temporal lobe 
cluster fell within the PAC, and this was the case for 12.5% of 
the temporal lobe cluster in the right hemisphere. 
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Figure 5. Whole-brain searchlight classifications of (/I) (clear vs. rot), (6) (NV vs. rotNV), peak level P < 0.001 uncorrected, FDR cluster corrected at q < 0.05. (C) Voxels 
commonly implicated in classifications of (clear vs. rot) and (NV vs. rotNV). (D) Plots of classifier accuracy (proportion correct) extracted for (clear vs. rot) and (NV vs. rotNV) 
within the left anterior, left posterior, right anterior, and right posterior STS. Plots show mean accuracy with error bars representing the standard error of the mean corrected for 
repeated-measures comparisons, (f) Regions of interest, 7x7x7 voxel cubes in the left anterior [-58 2 -16], left posterior [-62 -34 0], right anterior [56 2 -18], and right 
posterior STS (gray) [62 -34 0] (MNI coordinates). 



In the case of classifications of (NV vs. rotNV), above-chance 
classification was also identified in a bilateral fronto-temporo- 
parietal network, including the inferior frontal gyri (pars trian- 
gularis, pars opercularis, and pars orbitalis), inferior parietal 
cortices, and both anterior and posterior superior/middle tem- 
poral cortices (Fig. 5B). Additional small clusters were found in 
the bilateral cerebellum, left inferior temporal gyrus, precentral 
gyrus, supplementary motor area, precuneus, and superior and 
medial frontal gyrus. 3.3% of the left hemisphere cluster fell 
within TE 1.0, and 6.4% and 4.1% of the right hemisphere 
cluster fell within TE 1.0 and TE 1.2, respectively. 

There were a large number of voxels from clusters that 
showed above-chance classification for both (clear vs. rot) 
and (NV vs. rotlW). These voxels were found predominately 
in bilateral anterior and posterior temporal cortices, but small 
numbers of voxels were also identified in the bilateral inferior 
frontal gyri (pars opercularis and triangularis) and the left 
angular gyrus (Fig. 50- 

The classification accuracies from the searchlight maps 
from each subject were extracted from single voxels (the 



classification scores at these locations were reflective of the 
accuracy of the surrounding 7x7x7 voxels) at the same 
locations in the left anterior, right anterior, and left posterior 
STS that were examined in the earlier post hoc univariate ana- 
lyses (Fig. 5E). A right posterior STS ROI was constructed by 
taking the homotopic equivalent of the left posterior STS 
region (shown in gray). A 2 x 2 x 2 repeated-measures ANOVA 
with factors: Hemisphere (left and right), position (anterior 
and posterior), and contrast (clear vs. rot and NV vs. rotNV) 
was conducted (Fig. 5D). There was a significant main effect 
of position (i^in = 8.020, /*= 0.016) with posterior regions 
(M=0.63, proportion correct) showing higher accuracy than 
anterior regions iM= 0.59) and a main effect of contrast 
(fj 11 = 13.991, P= 0.003) with classifications of NV versus 
rotNV (M=0.63) higher than clear versus rot (M=0.59), but 
no main effect of hemisphere (Fin - 0.802, P= 0.390). There 
were no significant 2-way interactions: Hemisphere x position 
(_Fi 11 = 0.524, P= 0.484), hemisphere x contrast (Fi ii = 2.471, 
7^=0.144), and position x contrast (i^'i H = 0.711, P= 0.417). 
There was a significant 3- way interaction of hemisphere x 
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position X contrast (i^in = 11.849, /*= 0.006). Examining the 
simple interaction effects we identified that the simple 
hemisphere x contrast interaction was significant in the 
posterior (Fj ii = 6.721, P= 0.025) but not in the anterior 
regions iF^ ^ = 0.035, P= 0.858). We further examined the 
second-order simple effects in the posterior regions; this 
showed that accuracy in the left posterior STS was significantly 
higher for NV versus rotNV when compared with clear versus 
rot (P= 0.011), while there was no significant difference 
between the contrasts in the right posterior STS (P= 0.567). 
To summarize classification accuracies were higher in pos- 
terior when compared with the anterior temporal cortex and 
were higher for classifications of (NV vs. rotNV) when com- 
pared with (clear vs. rot), this was likely to be driven by the 
high classifications of (NV vs. rotNV) in the left posterior STS. 

Anatomical Mask Analysis 

Okada et al. conducted classification analyses within small 
local neighborhoods of voxels within the temporal cortex. 
Here, we extend this analysis by conducting classifications 
using the entirety of the bilateral temporal cortex, including 
PAC. In doing so, we can gain insights into how multivariate 
information is integrated across the temporal cortex in the de- 
coding of intelligible speech. 

Classification was conducted using an ROI consisting of the 
entire bilateral temporal cortex, including PAC and a control 
region, the inferior occipital gyrus. Cross validation using the 
run structure demonstrated that this ROI correctly classified 
0.74 (proportion correct) of volumes of (clear vs. rot), and 
0.79 of volumes of (NV vs. rotNV) correctly (chance level 
0.50; Fig. 6^i,Bi); and in both cases, classification was signifi- 
cantly greater than chance (clear vs. rot: f(ii) = 5.961, 
P< 0.001; NV vs. rotNV: fi-m = 9.949, P< 0.001) when Bonfer- 
roni correcting for 2 tests (P = 0.025). There was no evidence 
that the inferior occipital gyrus performed at a level greater 
than chance (clear vs. rot: P= 0.812; IW vs. rotNV: P= 0.967). 
Having demonstrated above-chance classification using the 
bilateral temporal cortex in both intelligibility classifications, 
we extracted the weight vector from a classifier trained using 
both runs of data for each intelligibility classification. 

For classifications of (clear vs. rot) and (NV vs. rotNV), the 
weight vector values were multiplied by the averaged response 
to the intelligible trials in each instance (i.e. the average of the 
clear trials was multiplied by the weight vector from the classifi- 
cation of (clear vs. rot) and the average of the NV trials was mul- 
tiplied by the weight vector from the classification of (NV vs. 
rotNV). This allowed us to understand how each voxel directly 
contributed to the classifiers' prediction in favor of an intelligi- 
ble trial being classified as such for each intelligibility classifi- 
cation. Voxels with resulting positive values contribute to 
increasing the likelihood that a trial will be classified as an intel- 
ligible trial. To understand how these values varied, each value 
was expressed as a percentage of the largest overall positive 
value and plotted against that value's ranked magnitude ex- 
pressed as a percentage of the total number of values. This plot 
demonstrated that the values decayed rapidly in relation to the 
largest positive value, with values ranking in the top 15% ac- 
counting for the majority of the range (Fig. 6Aii,iJii). Note that 
all the values in the top 15% were positive for every subject and 
for both classifications. As all the values were positive, this 
could be achieved by 1 of 2 mechanisms, either (1) the average 
response pattern for the intelligible condition at a particular 



voxel was positive and the weight vector value at that location 
was also positive or (2) the average response pattern at a par- 
ticular voxel was negative and the weight vector value was also 
negative. Note from the above, we can deduce that if a voxel 
received a positive weight then it follows that the response at 
that voxel for the averaged intelligible trial was also positive (i. 
e. there was a relative increase in signal for intelligible when 
compared with unintelligible trials), and if a voxel received a 
negative weight then the response at that voxel for the averaged 
intelligible trial was also negative (i.e. there was a relative in- 
crease in signal for unintelligible when compared with intelligi- 
ble trials). In order to differentiate these effects at each voxel, 
we separated positively and negatively weighted voxels when 
generating binary maps of the voxels that contributed most 
to classification, for voxels in the top 5%, 10%, and 15% of 
values. These were then summed in order to establish the con- 
fluence across subjects and visualized for different percentage 
bandings. 

These maps show that, in the case of both intelligibility 
classifications, the voxels contributing most to the classifi- 
cation of intelligible speech, in which voxel activation was 
higher for intelligible speech when compared with unintelligi- 
ble sounds, were found in the left anterior temporal lobe 
(Fig. 6C,D). In contrast, voxels lateral and posterior to PAC in 
the STG and planum temporale (and to a lesser extent in the 
posterior STS) contributed to classification, but as a result of a 
relative increase to unintelligible sounds when compared 
with intelligible speech. We then identified positive and nega- 
tive weights featuring in the top 15% of weights common to 
more than 4 participants and to both intelligibility contrasts 
(Fig. 6E). This confirmed the previous observation that the 
left anterior temporal lobe was associated with informative 
voxels in which activation was higher for intelligible speech 
when compared with unintelligible sounds, with the center of 
mass for these weights located at [—56 —4 —12]. In contrast, 
regions closer to PAC in the STG and planum temporale were 
also informative, but reflected greater activation to unintelligi- 
ble sounds when compared with intelligible speech, with the 
center of mass of voxels in the left hemisphere found at [—46 
—32 13] and in the right hemisphere at [54 —27 13]. 

Discussion 

In this study, we replicated and extended the findings of 
Okada et al. (2010). Similar to Okada et al., we found bilateral 
activation spreading along much of the STS for the main posi- 
tive effect of intelligibility. Extending their findings, we de- 
monstrated that there was a significant interaction between 
intelligibility and spectral detail in the bilateral mid-posterior 
superior temporal cortex. The interaction at a location in the 
left posterior STS was driven by a reduced relative response 
to rotated-noise-vocoded speech, in the absence of observed 
differences between clear, rotated, and noise-vocoded speech. 
Consistent with the demonstration of a significant interaction, 
the simple intelligibility effects of (clear — rot) and (NV — 
rotNV) gave rise to very different statistical maps and, indeed, 
only the left anterior STS was activated by the conjunction 
null of the 2 intelligibility contrasts. Follow-up analyses de- 
monstrated that the intelligibility response in the left anterior 
STS was reliably stronger than in other regions. Further to the 
univariate analysis, we conducted multivariate analyses exam- 
ining the contribution of local pattern information to 
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Figure 6. Classifier accuracy and weigiit maps from classifications using the entirety of the bilateral temporal cortex (including PAC) and the inferior occipital gyrus (used as a 
control region). Classifier accuracy (proportion correct) for (clear vs. rot), (/lii) Weight magnitude values of weights from (clear vs. rot). (Si) Classifier accuracy (proportion 
correct) for (NV vs. rotNV). (6ii) Weight magnitude values of weights from (NV vs. rotNV). (C) The most discriminative 5%, 10%, and 15% of classifier weights from a classifier 
trained to discriminate (clear vs. rot). Color gradient indicates the degree of concordance across subjects. (D) The most discriminative 5%, 10%, and 15% of classifier weights 
from a classifier trained to discriminate (NV vs. rotNV). Color gradient indicates the degree of concordance across subjects, (f) Weights featuring in the top 15% of weight 
values common to both (clear vs. rot) and (NV vs. rotNV) and more than 4 subjects. 



classifications of intelligibility. Here, we replicated, using a 
whole-brain searchlight analysis, the findings of Okada et al. 
in showing above-chance classification within the bilateral 
anterior and posterior STS for each of the intelligibility classi- 
fications, and showed that there was more information in the 
bilateral posterior when compared with the bilateral anterior 
STS, albeit this was driven mainly by the high classification of 
(NV vs. rotNV) in the left posterior STS. We conducted an 
additional analysis in which classification was conducted 
using the whole of bilateral temporal cortex, rather than using 
small local neighborhoods of voxels. Using classifier weights 
from this analysis, we showed that voxels contributing most 
to classification and exhibiting a relative increase in response 
to intelligible speech when compared with unintelligible 



sounds were found predominandy in the left anterior STS. In 
contrast, bilateral STG and the planum teniporale contributed 
to classification, but as a result of a relative increase in acti- 
vation to unintelligible sounds. 

The most robust univariate intelligibility effects were 
located in the left anterior STS consistent with our original 
Scott et al. (2000) study. Note that we assume that intelligibil- 
ity includes all stages of comprehension over and above early 
acoustic processing, and as such the intelligibility responses 
that we see likely reflect multiple processes including acous- 
tic-phonetic, semantic, and syntactic processing, and associ- 
ated representations. Okada et al. (2010) did not test for 
interactions between intelligibility and spectral detail, present 
the conjunction null of the simple effects, and conduct 
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univariate ROI analyses to differentiate the response within 
the anterior and posterior regions or plot data from peak level 
voxels. As a consequence, it is not possible to ascertain 
whether the univariate effects demonstrated in their study 
were also stronger in the left anterior STS or, indeed whether 
there were any regions that showed a differential magnitude 
of response to the 2 intelligibility contrasts. We have con- 
ducted a number of previous studies in which we have com- 
pared speech with complex acoustic baselines. In these 
studies, while activation in the left anterior STS has been con- 
sistently observed, responses in other temporal regions have 
been identified much less consistently (Scott et al. 2000, 2006; 
Narain et al. 2003; Spitsyna et al. 2006; Awad et al. 2007; Frie- 
derici et al. 2010). This may suggest that the univariate ampli- 
tude of response is lower, or that intersubject variability is 
higher in temporal regions outside of the left anterior STS. 
The classifier weight maps from the multivariate analyses 
provide converging evidence that increases in activation in 
anterior regions are associated with responses to intelligible 
speech. These findings are consistent with work, showing that 
the left anterior temporal cortex responds to specifies-specific 
vocalizations in the monkey (Poremba et al. 2004) and that in 
humans "phonemic maps", voice responses, and the "semantic 
hub" reside in the anterior temporal cortex (von Kriegstein 
et al. 2003; Obleser et al. 2006; Patterson et al. 2007; Leaver 
and Rauschecker 2010). The weight maps also provide 
support for the notion of a hierarchical speech processing 
system (Davis and Johnsrude 2003) in which regions closest to 
PAC engage in predominantly acoustic processing and are 
driven robustly by complex nonspeech sounds, and those 
further from primary auditory regions, in the STS, exhibit a 
preferential response to linguistically relevant stimuli. 

A region of the left posterior STS was implicated in the 
interaction between intelligibility and spectral detail at the 
whole-brain level. This region was close, but slightly superior, 
to the center of the ROI in the left posterior STS examined in 
the post hoc univariate analyses (which had been defined 
based on the peak for the main effect of intelligibility). The 
peak implicated in the interaction showed no evidence of re- 
sponding differentially to clear, rotated, and noise-vocoded 
speech. Note that the response profile at this location (Fig. 3, 
plot 2) is almost identical to the profile at a similar location in 
the original Scott et al. (2000) study (Fig. 2, plot 2, p 2403). It 
may be that this region of the left posterior STS engages in 
acoustic phonetic processing that supports the resolution of 
intelligible percepts, explaining its elevated response to the 
unintelligible rotated speech condition as well as to the intelli- 
gible stimuli. Rotated speech contains phonetic features such 
as the absence/presence of voicing, but these features do not 
generally give rise to intelligible sounds; rotated-noise- 
vocoded speech in contrast does not contain any recognizable 
phonetic features explaining its relative deactivation. A 
number of regions of the temporal cortex, especially within 
the right planum temporale, responded more strongly to 
rotated speech than to any other condition. Rotated speech 
retains much of the spectro-temporal structure of speech, in- 
cluding formants and a quasi-harmonic structure, making it 
arguably the most appropriate nonspeech baseline used to 
date. It does however differ from speech in a number of ways. 
For example, the rotation of fricatives results in broadband 
energy in the low frequencies, a feature not ordinarily charac- 
teristic of speech, and while rotation maintains the equal 



spacing of the harmonics, it changes their absolute frequen- 
cies, giving rise to a slightly unusual pitch percept. These 
factors may contribute to why rotated speech drives some 
auditory regions more strongly than speech. We have recently 
developed alternative speech/nonspeech analogs by synthe- 
sizing, and further noise-vocoding, sinewaves tracking the for- 
mants of speech, and then combining the amplitude tracks 
from one sentence with the frequency tracks from another to 
generate unintelligible equivalents. These unintelligible sen- 
tences are arguably more closely matched acoustically and 
perceptually to their intelligible counterparts than rotated 
speech is to clear speech. In studies using these stimuli, we 
have found robust activation in the anterior and posterior 
temporal cortices when responses to the (mostly) intelligible 
condition were compared with the unintelligible condition 
(McGettigan and Evans et al. 2012; Rosen et al. 2011). 
However, as these studies did not directly compare the 
response of anterior with posterior regions, it is difficult to 
ascertain whether the response was stronger in anterior 
regions. As the intelligible condition was also degraded, these 
findings may be more informative in understanding the 
neural systems underlying effortful speech comprehension 
than speech comprehension more generally. 

The fact that there were some regions of the temporal 
cortex that responded more strongly to rotated than to clear 
speech highlights the need to account for how the relative 
magnitude of activation to each condition contributes to 
classification accuracy. It would be possible for a region to 
exhibit a high level of classification solely because of a rela- 
tive increase in signal to the unintelligible conditions, which 
would be hard to reconcile with the suggestion that an area 
coded a response to fully resolved intelligible speech per- 
cepts. To address this, we conducted searchlight analyses in 
which the mean level of activation had been removed for 
each trial — this prevented classification from being achieved 
as a result of a relative increase in signal across all the voxels 
within a searchlight to one condition over another. By con- 
ducting a searchlight analysis, rather than classification within 
single small ROIs in the temporal cortex as was conducted by 
Okada et al., we were able to more fully map the response of 
the temporal cortex and regions beyond. We chose not to 
adopt the approach of Okada et al. in conducting classifi- 
cations of spectral detail and creating an "acoustic invariance 
index" (by comparing the relative accuracy of intelligibility 
and spectral detail classifications) because 6 band noise- 
vocoded and clear speech differ in both spectral detail and 
intelligibility, thus confounding the index. Indeed, one might 
imagine that differences in intelligibility between the clear 
and noise-vocoded speech would have been further exacer- 
bated by continuous data acquisition in which sounds were 
played over the competing noise of the scanner (in contrast to 
the sparse acquisition conducted in this study). 

Our results indicated that there was significant local infor- 
mation in the anterior and posterior STS bilaterally for both in- 
telligibility classifications (unlike in the whole-brain univariate 
analysis in which this was the case only in anterior regions), 
and that posterior regions contained relatively more infor- 
mation than anterior regions. The identification of greater 
information in posterior regions seemingly contradicts the 
finding that classifier weights in anterior areas were associated 
with increases in activity for intelligible speech and the obser- 
vation of stronger univariate intelligibility effects in the left 
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anterior STS. This may, however, reflect the fact that there are 
multiple ways that information, capable of discriminating 
responses to intelligible speech from unintelligible sounds, can 
be extracted from the fMRI signal. It remains to be seen which 
of the analysis methods best imitates the neural processes in- 
volved in encoding intelligible speech, or indeed whether our 
findings reflect the fact that intelligibility responses are 
encoded using multiple complementary coding systems. 

The searchlight analysis also identified significant infor- 
mation within the inferior frontal and inferior parietal cor- 
tices. These findings are in keeping with a recent searchlight 
analysis conducted by Abrams et al. (2012) in which the 
inferior frontal and parietal cortices were shown to discrimi- 
nate clear from rotated speech in the absence of a similar 
effect in those regions in an accompanying whole-brain uni- 
variate analysis. Indeed, while intelligibility responses have 
been shown in these regions previously (Davis and Johnsrude 
2003; Obleser et al. 2007), they are arguably less consistently 
identified than in temporal lobe regions (Abrams et al. 2012). 
Our results, like those of Abrams et al. (2012), suggest that 
univariate analyses underestimate the extent of regions in- 
volved in responding to intelligibility. Indeed, these findings 
suggest that local information resides in a network of regions 
including anterior and posterior temporal, inferior parietal, 
and inferior frontal cortices. This is in accord with the sugges- 
tion that the comprehension of speech recruits multiple 
streams of processing radiating anteriorly and posteriorly 
from primary auditory regions (Davis and Johnsrude 2007; 
Peelle et al. 2010). 

To conclude, we identified that the most robust univariate 
intelligibility effects were found in the left anterior STS, con- 
sistent with our previous findings. When multivariate classifi- 
cations were conducted in which information could be 
integrated across the anterior and posterior temporal cortices, 
increases in activation in anterior regions were shown to be 
maximally discriminative in classifying intelligible speech, 
again indicating the relative importance of the left anterior 
STS. However, when classification was conducted within 
small local neighborhoods in which the mean activation to 
each condition was removed, we found greater information in 
the posterior when compared with anterior regions, and 
identified a much wider intelligibility network that included 
the inferior parietal and frontal cortices. These results are con- 
sistent with the suggestion that the comprehension of spoken 
sentences engages multiple streams of processing, including 
an anterior stream that shows evidence of being most strongly 
engaged when data are analyzed using univariate methods, 
and a posterior stream or streams that can be identified more 
readily (at a whole-brain level) with multivariate pattern- 
based methods. Hence, as has been suggested recently by 
Abrams et al. (2012), inconsistencies in the literature may be 
reconciled by the use of more sensitive multivariate methods 
that allow the identification of a wider intelligibility network. 
The exact contribution of each component of this network is 
unknown, and we hope that future research will focus on at- 
tempting to understand how these components and their 
interaction contribute to resolving intelligibility. 
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